Antipattern: One Size Fits All

Let's see some caveats of using database keys traditionally.

Books, articles, and programming frameworks have established a cultural convention that every database table must have a primary key column with the following characteristics:

  • The primary key’s column name is id.

  • Its data type is a 32-bit or 64-bit integer.

  • Unique values are generated automatically.

The presence of a column named id in every table is so common that this has become synonymous with a primary key. Programmers learning SQL get the false idea that a primary key always means a column defined in this manner.

Creating the Bugs table

Adding an id column to every table causes several effects that make its use seem arbitrary.

Making a redundant key#

We may sometimes see an id column defined as the primary key simply for the sake of tradition, even when another column in the same table could be used as the natural primary key. The other column may even be defined with a UNIQUE constraint. For example, in the Bugs table, we could label bugs using a string with a mnemonic for the project the bug belongs to or other identifying information.

Making a redundant key

Let’s run the query in the following playground to see the effect on the database.

Retrieving data after making a redundant key

The bug_id column in the example has similar usage to id in that it serves to identify each row uniquely.

Allowing duplicate rows#

A compound key consists of multiple columns. One typical use for a compound key is in an intersection table like BugsProducts. The primary key should ensure that a given combination of values for bug_id and product_id appears only once in the table, even though each value may appear many times in different pairings.

However, when we use the mandatory id column as the primary key, the constraint no longer applies to two columns that should be unique.

Creating the BugsProducts table

Let’s try to run the code in the playground below.

Retrieving data from the BugsProducts table

We’ll find that there are 3 records for the same bug_id and the product_id. This means that these records are duplicated.

Duplicates in this intersection table cause unintended results (because of duplicated records) when we use the table to match Bugs with Products. To prevent duplicates, we can declare a UNIQUE constraint over the two columns besides id:

Creating BugsProducts table with constraint

But if we need a unique constraint over those two columns anyway, the id column is superfluous.

Obscuring the meaning of the key#

The word “code” has a number of definitions, one of which is a way to communicate a message with brevity or secrecy. In programming, we should have the opposite goal — to make the meaning clearer.

The name id is so generic that it holds no meaning. This is especially important when we join two tables, and they have the same primary key column name.

Joining two columns having the same primary key name

How do you tell the bug id from the account id in your application code if you reference columns by name instead of by ordinal position? This is a problem, especially in dynamic languages like PHP, when a query result is an associative array: one column overwrites the other unless you specify column aliases in your query.

The name of the id column doesn’t help make the query any clearer. If, instead, the columns were named bug_id and account_id, the reader would have a much easier time reading the query results. We use a primary key to address individual rows of a table, so the column’s name should give a clue about the type of entity in that table.

Using USING#

A lot of us are probably already familiar with the SQL syntax for a join — using the keywords JOIN and ON before an expression to evaluate matching rows in the two tables.

Let’s try to run this query in the next playground.

Using JOIN and ON in a query

SQL also supports a more concise syntax for expressing a join between two tables. We can rewrite the previous query in the following way if the columns have the same name in both tables:

Using the USING in a query

However, if all tables are required to define a pseudo key as a primary key named id, then a foreign key column in a dependent table can never use the same name as the primary key it references. Instead, we must always use the more verbose ON syntax:

Using JOIN and ON in a query

Compound keys are hard#

Some developers refuse to use compound keys because they feel that these keys are too hard to use. Any expression that compares a key to another must compare all columns. A foreign key that references a compound primary key must itself be a compound foreign key. It requires more typing to use compound keys.

This refusal is like a mathematician refusing to use two-dimensional or three-dimensional coordinates, and insisting on performing all calculations as though objects exist within a one-dimensional, linear space. It’s true that this would make a lot of geometry and trigonometry much simpler, but it fails to describe real-world objects that we need to work with.

Synopsis: ID Required
Solution: Tailored to Fit
Mark as Completed
Report an Issue